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ABSTRACT 

We provide the first compreliensive analysis of 
any transcription factor family in Cryptosporidium, 
a basal-branching apicomplexan that is the sec- 
ond leading cause of infant diarrhea globally. AP2 
domain-containing proteins have evolved to be the 
major regulatory family in the phylum to the ex- 
clusion of canonical regulators. We show that api- 
complexan and perkinsid AP2 domains cluster dis- 
tinctly from other chromalveolate AP2s. Protein- 
binding specificity assays of C. parvum AP2 do- 
mains combined with motif conservation upstream 
of co-regulated gene clusters allowed the construc- 
tion of putative AP2 regulons across the in vitro life 
cycle. Orthologous Apicomplexan AP2 (ApiAP2) ex- 
pression has been rearranged relative to the malaria 
parasite P. faiciparum, suggesting ApiAP2 network 
rewiring during evolution. C. hominis orthologs of 
putative C. parvum Ap\AP2 proteins and target genes 
show greater than average variation. C. parvum 
AP2 domains display reduced binding diversity rela- 
tive to P. faiciparum, with multiple domains binding 
the 5 -TGCAT-3 , 5 -CACACA-3 and G-box motifs (5 - 
G[T/C]GGGG-3'). Many overrepresented motifs in C. 
parvum upstream regions are not AP2 binding mo- 
tifs. We propose that C. parvum is less reliant on 
ApiAP2 regulators in part because it utilizes E2F/DP1 
transcription factors. C. parvum may provide clues to 
the ancestral state of apicomplexan transcriptional 
regulation, pre-AP2 domination. 



INTRODUCTION 

Apicomplexan parasites are the causative agents of some of 
the world's most devastating infectious diseases, including 
malaria (caused by Plasmodium), toxoplasmosis ( T. gondii) 
and cryptosporidiosis (Cryptosporidium). Cryptosporidium 
species, primarily C. parvum and C. hominis, have recently 
been revealed to be the second leading cause of infant di- 
arrhea globally (1). Plasmodium and Cryptosporidium di- 
verged between 824 and 350 mya (2) with more recent esti- 
mates of ~420 mya (3,4), a distance comparable to that be- 
tween humans and the ancestral chordate (5). While RNA 
polymerase-associated factors and basal transcription fac- 
tors have been identified in the Apicomplexa (6), examina- 
tion of apicomplexan proteomes yielded a surprising dearth 
of 'typical' eukaryotic enhancer proteins and their charac- 
teristic binding sites (7,8). These findings were highly un- 
expected, given the extensive evidence for transcriptional 
control (9-11). The lack of recognizable, specific transcrip- 
tion factors initially suggested that the specific transcrip- 
tion factors were likely so divergent from those found in 
other eukaryotes that they were unrecognizable. Balaji et al. 
(8) took a more sophisticated approach to tackle this is- 
sue by generating new Hidden Markov Models (HMMs) 
to perform sensitive sequence analyses of several apicom- 
plexan genomes {Plasmodium, Cryptosporidium, Theileria) 
for all known DNA-binding domains (8). The team con- 
firmed the dearth of 'typical' DNA-binding enhancer pro- 
teins, but they did identify Myb and zinc-finger proteins, 
as well as an unexpected family of proteins with multiple 
members present in all examined apicomplexan genomes. 
This Apicomplexan AP2 family of proteins (ApiAP2) bears 
resemblance to the AP2/ERF family of DNA-binding tran- 
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scription factors first identified in plants (8). Subsequent 
work has indicated the near-complete domination of this 
acquired ApiAP2 family of transcription factors in apicom- 
plexan transcriptional regulation (12-17). How the AP2 do- 
main came to reside in the Apicomplexa and how radia- 
tion of this family led to the assumption of regulatory du- 
ties from traditional eukaryotic transcription factors, such 
as Myb and C2H2 zinc fingers, which are still found across 
Apicomplexa (18), or E2F/DP1, which is absent in all stud- 
ied apicomplexans except for Cryptosporidium (19), is unre- 
solved. 

At the initial discovery of ApiAP2 proteins in 2005, the 
authors postulated that the apicomplexan AP2 domain is 
likely of plant origin. Given the evolutionary history of the 
Apicomplexa which includes the secondary endosymbiosis 
of an alga (recently shown to be rhodophyte in origin (20)) 
whose only remnants are the non-photosynthetic apicoplast 
organelle, it was highly plausible that the AP2 domain had 
been transferred from the algal endosymbiont to the host 
nuclear genome (8). Many cases of gene transfer from al- 
gae, cyanobacteria, viruses and even metazoa to the nuclear 
genome have been documented in the Apicomplexa (21- 
25). This hypothesis regarding the origin of AP2 domains in 
the Apicomplexa has had little follow-up. However, as more 
genome sequences have been generated, AP2 domains have 
been found throughout the tree of life, notably in several 
bacteria and their phages (8,26,27), and as noted at their 
initial discovery (8), sequence similarity between these do- 
mains does not link apicomplexan AP2 domains to plant 
AP2 domains to the exclusion of these other groups. Im- 
portantly, in most other AP2 families identified, the AP2 
domain is associated with homing endonuclease or inte- 
grase domains of mobile elements. There is no evidence of 
mobile elements (active or otherwise) in the apicomplexan 
genomes examined here. Evidence of retrotransposable el- 
ements has been reported in another early branch of the 
Apicomplexa, the gregarines (28), and there is evidence that 
apicomplexan genomes which lack mobile elements used to 
contain them (29,30), or contain inactive elements as is the 
case in the apicomplexan coccidian parasite Eimeria tenella 
(31). More recently, non-integrase-associated bacterial AP2 
proteins with architectures similar to AP2 proteins found in 
plants and alveolates have been reported (32). These AP2 
proteins are also predicted to function as novel, specific 
transcription factors. 

ApiAP2 proteins are highly divergent in both sequence 
and length (ranging from ~200 to thousands of amino 
acids), and they generally display no discernable homology 
outside of the AP2 domain (which is ~60 aa in length). AP2 
domains are often the only globular domains in ApiAP2 
proteins, and the domain can occur in architectures of one 
to four or more per protein (8,33). Since the discovery of 
the ApiAP2 proteins, much work has been done both com- 
putationally and experimentally to implicate these proteins 
in gene regulation. Five ApiAP2 proteins have been iden- 
tified as key stage-specific regulators in Plasmodium (15- 
17,34,35). Another ApiAP2 protein (PFF0200c) has been 
implicated as a player in P. falciparum var gene regulation 
by binding the SPE2 DNA motif and acting as a DNA- 
tethering protein involved in formation and maintenance of 
heterochromatin (36). Campbell et al. (37) characterized the 



binding motifs for all 27 members of the ApiAP2 family in 
P. falciparum and used these data in conjunction with intra- 
erythrocytic expression data to predict putative regulatory 
targets of these proteins (37). The role of ApiAP2 proteins 
in gene regulation has also been investigated to a lesser de- 
gree in T. gondii, where several ApiAP2 proteins have been 
implicated in regulating progression through the cell cycle 
(38) as well as crucial virulence factors (14). Other stud- 
ies have implicated ApiAP2s in regulating a developmental 
transition (13). Radke et al (12) recently characterized a T. 
gondii AP2 that acts as a repressor of bradyzoite develop- 
ment. This is the first example of an ApiAP2 acting as a re- 
pressor and lends further support to the idea that members 
of the ApiAP2 family are multifaceted in their functional 
and regulatory capabilities. 

The ApiAP2 literature focuses extensively on studies of 
regulation in Plasmodium and Toxoplasma .spp. , and there 
have been no extensive comparative studies of the AP2 
domain-containing proteins across the phylum. The rela- 
tively low fraction of total gene content that is conserved 
across the phylum (orthologs) (39) and the widely variable 
size of total gene content (40) necessitate the evolution of 
the gene regulatory networks to facilitate these vastly dif- 
ferent regulatory demands. Studies to date have not defini- 
tively addressed whether orthologous apicomplexan AP2 
domains recognize the same DNA sequence motif and if 
these motifs are found upstream of orthologous sets of 
genes across apicomplexans. The sparse data that do exist 
suggest that putative ApiAP2 regulons may be quite dif- 
ferent. For example, Campbell et al. found that though P. 
vivax, P. yoelli and P. falciparum AP2 domains are nearly 
perfectly conserved, and the timing of orthologous ApiAP2 
protein expression is very similar, putative target gene sets 
of orthologous ApiAP2s are highly divergent (37). DeSilva 
et al. (41) investigated the binding specificity of a single C. 
parvum domain that was highly conserved with a Plasmod- 
ium AP2 domain. They found that the binding specificities 
were absolutely conserved; however, of the 127 putative P. 
falciparum targets of regulation, only 26 are conserved in 
C. parvum, suggesting the transcriptional network itself has 
evolved considerably since Plasmodium and Cryptosporid- 
ium diverged (41). 

In this study, we have used HMMs and phylogenetic 
analysis to examine the distribution and evolutionary re- 
lationships of the AP2 DNA-binding domains across the 
Apicomplexa and an outgroup perkinsid oyster parasite, 
Perkimus marinus. We also examine the relationship of 
these apicomplexan and perkinsid AP2 domains to the 
AP2 domains found throughout the chromalveolates. We 
used the insights gained from these comparisons to select 
AP2 protein domains representing most of the AP2 fam- 
ily from the basal-branching apicomplexan Crypto.sporid- 
ium parvum for further study. We determined the binding 
specificities of these domains experimentally and searched 
for the identified binding motifs upstream of co-regulated 
C. parvum gene clusters to identify putative regulatory tar- 
gets with the goal of identifying the putative ApiAP2 tran- 
scriptional regulatory network in this organism. Finally, we 
compare C. parvum results to the limited data available from 
C. hominis. 
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MATERIALS AND METHODS 

Identification of AP2 and ApiAP2 domains 

To identify ApiAP2 domains for phylogenetic analyses, we 
developed an HMM that appears to be more sensitive to 
the specific detection of ApiAP2s than the Pfam HMM de- 
signed for the detection of AP2 domains (www.pfam.org). 
We first ran the existing AP2 HMM on annotated pro- 
tein sequences for apicomplexans T. gondii, Neospora can- 
inum, P. falciparum, P. vivax, C. parvum, Theileria annulata 
and T. parva using HMMER (version 2.4i; http://hmmer. 
org/). We opted to use P. falciparum gene IDs from Plas- 
moDB version 6.0 (http://plasmodb.org) to facilitate com- 
parisons with existing P. falciparum AP2 domain binding 
data, and we have provided a look-up table to the most re- 
cent gene IDs (Supplementary File SI, Table SI). We next 
constructed an alignment with the T-coffee package (42) 
of the most significant domain hits from this run (le-4 or 
lower). The ApiAP2 HMM was built from this alignment 
using HMMER. We used this new HMM in conjunction 
with the Pfam AP2 HMM to search annotated protein se- 
quences to examine the distribution of the AP2 domain 
across several chromalveolates, including apicomplexans P. 
falciparum, P. knowlesi, P. vivax, P. yoelli, T. parva, T. annu- 
lata, Babesia hovis, N. caninum, T. gondii, Cryptosporidium 
muris and C. parvum; the perkinsid oyster parasite Perkin- 
sus marinus; dinoflagellates Karenia brevis and Alexan- 
drium tamarense; ciliates Tetrahymena thermophila, Parame- 
cium tetraurelia and Ichthyophthirius multifiliis and stra- 
menopiles Thalassiosira pseudonana, Phaeodactylum tricor- 
nutum, Ectocarpus siliculosus, Phytophthora infestans and 
Phytophtliora sojae. Extant representatives of purported 
algal endosymbionts Cyanidioschyzon merolae, Porphyra 
purpurea, P. yezeoensis (representative rhodophytes) and 
Chlamydomonas reinhardtii and Micromonas sp. RCC299 
(representative chlorophytes) were also examined. All Plas- 
modium annotated proteins were obtained from Plas- 
moDB version 6.0. All T. gondii and N. caninum anno- 
tated proteins were obtained from ToxoDB version 5.2. 
C. parvum annotated proteins were obtained from Cryp- 
toDB version 4.6. T. parva data were obtained from TIGR 
Eukaryotic Genome Projects (ftp://ftp.tigr.org/pub/data/ 
Eukaryotic_Projects/t_parva/annotation_dbs/). T. annulata 
data were obtained from the Wellcome Trust Sanger Insti- 
tute (http://www.sanger.ac.uk/Projects/ Pathogens/). As no 
annotated protein sequences were available at the time of 
our analyses, dinoflagellate analyses were run on six-frame 
translations of clustered Expressed Sequence Tags (ESTs) 
(Open Reading Frames [ORFs] > 75 AA) and perkinsid 
analyses were run on six-frame translations of the genome 
(ORFS > 50 AA). The sequences of all domains identified 
via six-frame translations are available in Supplementary 
File S2. All other organism annotated protein sequence data 
were downloaded from the National Center for Biotechnol- 
ogy Information, NCBI GenBank (http://www.ncbi.nlm. 
nih.gov). AP2 protein and domain counts for each organism 
were determined using a permissive domain e-value cutoff 
of 10. 

We have found AP2 domain detection to be very sensi- 
tive to size of database searched, and some weakly scoring 



C. parvum domains can be detected when searching its pro- 
teins individually (35 domains are detected with the ApiAP2 
HMM searching C. parvum only versus 21 domains when 
all organism proteins are searched concurrently; Supple- 
mentary File SI, Table S2); however, they are not signifi- 
cant enough when proteins from all included organisms are 
searched jointly. Due to these fluctuations, domain counts 
are only approximate. We tested the two most significant 
additional domains detected when C. parvum was searched 
alone (cgd2_2990 and cgd3_1980) on protein-binding mi- 
croarrays (PBMs). Only one of these, cgd2J2990, had a de- 
tectable binding motif 

Phylogenetic analysis of AP2 domains 

Determination of homolog groups. All phylogenetic anal- 
yses were carried out on AP2 domain sequences only, as 
full-length proteins are generally too divergent to be able to 
detect meaningful evolutionary relationships between them 
(as determined by multiple sequence alignment; data not 
shown). Alignments of AP2 domain sequences were per- 
formed using the HMMALIGN command of the HMMER 
package (version 2.4i; http://hmmer.org/) and edited using 
Jalview (43) (edited alignments can be found in Supplemen- 
tary Files S3 and S4). Unrooted maximum likelihood trees 
were constructed from top-scoring (le-3 or better) domain 
sequences across chromalveolates and green algae C. rein- 
hardtii and Micromonas (Supplementary File SI, Table S3) 
using RAxML (version released 4/26/2012) with a gamma 
rate estimation and Dayhoff model of codon substitution 
(44). Taxa with very similar representatives were not in- 
cluded in the tree for purposes of simplification (P. knowlesi, 
P. yoelli, T. parva, N. caninum). Bootstrap support was 
obtained from 100 replicates. Trees were visualized using 
FigTree (v. 1.4.0; http://tree.bio.ed.ac.uk/software/figtree/). 
P. falciparum and C. parvum AP2 domains alone were ana- 
lyzed and a tree was created using the same methods. 

To identify homologous clusters of AP2 domains, a lo- 
cal install of the OrthoMCL algorithm (45) was run on all 
identified AP2 domains in apicomplexans and perkinsids 
using an e-value ranging from le-04 to le-11. Domains dis- 
playing similarity at these e-values were clustered into ho- 
molog groups. Homolog groups found at le-06 were used 
for subsequent analyses, as this is the highest stringency at 
which orthology could be detected between apicomplexan 
and P. marinus AP2 domains, le-1 1 is the highest stringency 
at which orthology between C. parvum and other apicom- 
plexan AP2 domains can be detected. Relationships were 
visualized using Circos (46). 

Comparisons between C. parvum and C. hominis 

We focused on C. parvum in our phylogenetic analyses 
because of the better genome assembly (18 versus 1422 
contigs for C. hominis) and availability of gene expression 
data. However, these species do exhibit differing host ranges 
and pathogenicity, and comparisons, where possible, are 
warranted. C. hominis is the predominant Cryptosporidium 
pathogen of humans, and the available genome sequence is 
97% identical to C. parvum (47). The ApiAP2 HMM was 
run on C. hominis annotated proteins (downloaded from 
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Cryptodb.org version 6.0). Orthologous C. parvum and C. 
hominis ApiAP2 proteins, as well as select upstream regions 
of orthologous predicted ApiAP2 target genes, were evalu- 
ated for conservation. C. parvum and C. hominis AP2 do- 
mains were compared to each other using the blastp pack- 
age of NCBIblast (version 2.2.26). Upstream regions of se- 
lect C. parvum AP2 target genes were compared against or- 
thologous C. hominis upstream regions using NCBI blastn 
(version 2.2.26). 

Determination of C. parvum ApiAP2 binding motifs 

N-terminal GST fusion proteins were made using the 
pGEX4T-l vector (GE Healthcare) and the 23 predicted 
C. parvum ApiAP2 domains and their flanking residues. 
Many flanking residues were included to ensure capture of 
each domain. Domain boundaries were determined using 
custom-built HMMs run on all annotated C. parvum pro- 
teins (downloaded from CryptoDB.org, version 4.6). The 
domains and flanking sequence were PCR-amplified and 
cloned into the BamHI restriction site in pGEX4T-l. Pro- 
teins were expressed and purified as previously described 
(41). Briefly, E. coli BL21 (RIL Codon PLUS, Strata- 
gene) cells were induced with 200 mM isopropyl-beta-D- 
thiogalactopyranoside (IPTG) at 25°C. Proteins were then 
purified using Uniflow Glutathione Resin (Clontech) and 
eluted in 10 mM reduced glutathione, 50 mM Tris HCL, 
pH 8.0. Proteins were verified with western blots using an 
anti-GST antibody (Invitrogen) and purity was verified by 
silver stain. 

A minimum of two PBM experiments were performed 
with each purified protein construct to determine their 
binding specificities as previously described (37,41). Mo- 
tifs bound at a threshold of 0.45 or greater were considered 
significant. Similarity between C. parvum ApiAP2 binding 
sites was determined using the web-based STAMP tool (48). 
Comparisons between orthologous C. parvum and P. falci- 
parum ApiAP2 binding sites (using P. falciparum ApiAP2 
binding motif data from (37,41)), as well as comparisons 
between C. parvum ApiAP2 binding sites and C. parvum 
overrepresented upstream motifs (49) were also made using 
STAMP 

Predictions of putative ApiAP2 target genes 

Definition of C. parvum upstream regions. Upstream re- 
gions were designated as in (49). Briefly, we downloaded the 
C. parvum genome (v 4.2) and nucleotide sequences for all 
protein-encoding genes from CryptoDB (http://cryptodb. 
org/cryptodb/, (50)). Custom Perl scripts were used to ex- 
tract either (i) 1 kb of sequence upstream of each translation 
start site, or (ii) the upstream sequence until a gene was en- 
countered on either strand. The translational start site was 
used because we do not have untranslated region (UTR) 
information for predicted genes. The C. parvum genome is 
only 9.1 Mb and is highly compact with very few introns 
and small intergenic spaces. To exclude the possibility of in- 
cluding coding regions in this set due to misannotation, a 
BLASTX was performed against the NCBI NR database 
using the set of upstream sequences as the query. Upstream 
sequences that contained significant portions of 100% iden- 
tity to coding sequences were eliminated. 



Target gene prediction. We modified the target prediction 
algorithm used in (37) for use with our data to identify 
putative AP2 target genes. This algorithm takes position 
weight matrices derived from PBM scores for each AP2 do- 
main and searches for matches in the upstream sequence 
database. Each AP2 is assigned a score for each gene based 
on motifs found. The glmnet package in R (51) is then im- 
plemented to make a regression between this AP2 motif 
score and the expression pattern for each gene (C. parvum 
expression data from (52)) to determine how much the AP2 
motif contributes to each gene's expression. An average ex- 
pression pattern for genes possessing a particular AP2 mo- 
tif upstream is then iteratively built, and genes that match 
this average expression pattern within a statistical threshold 
are designated as putative regulatory targets. P. falciparum 
regulatory targets were previously defined using a false dis- 
covery rate of 1% (37). As we have comparatively few time 
points over which we have expression information (seven for 
C. parvum versus 47 for P. falciparum) and thus have less 
statistical power, we considered genes falling within a false 
discovery rate of 25% as putative regulatory targets. 

Evaluating evolutionary history of AP2 domains versus evo- 
lutionary history of putative target genes 

Putative target genes of shared ('ancestral' or 'pan- 
apicomplexan') and lineage-specific ApiAP2 domains were 
compared against lists of three different evolutionary 
classes of apicomplexan genes as determined by Or- 
thoMCL: (i) those shared between all of 12 apicomplexans 
(the 11 used for all other analyses, as well as P. berghei); (ii) 
genes shared between apicomplexans of at least two differ- 
ent genera and (iii) genus-specific genes (genes which have 
no orthologs outside of their respective genus). Putative tar- 
gets were then classified as 'shared' or 'hneage-specific'. 

Comparisons between orthologous C. parvum and P. falci- 
parum ApiAP2 networks 

P. falciparum orthologs for C. parvum ApiAP2 targets were 
identified using the 'transform by orthology' tool at Eu- 
pathDB (v. 2.17, http://EupathDB.org). Lists of putative 
targets for each orthologous P. falciparum ApiAP2 were 
then searched with the list of C. parvum ApiAP2 target or- 
thologs to identify shared targets. 

RESULTS 

Perkinsid and apicomplexan AP2 domain families appear dis- 
tinct from other chromalveolate AP2 domains 

It is not known if AP2 domains were present in the chro- 
malveolate ancestor, or if they arrived one or more times as 
a result of the multiple endosymbiotic and lateral transfer 
events that characterize the chromalveolates (21-24,53,54). 
Thus, we examined the distribution of AP2 domains across 
several chromalveolates including apicomplexans, a perkin- 
sid, dinoflagellates, ciliates and stramenopiles, as well as in 
extant representatives of the endosymbiont donors to the 
chromalveolates, rhodophytes and chlorophytes using both 
a custom-buih ApiAP2 HMM (Supplementary File S5) and 
an existing AP2 HMM available from Pfam. (Figure 1 ; Sup- 
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Figure 1. Distribution and quantification of AP2 proteins and domains 
across cliromalveolates and algae. Counts of AP2 domain-containing pro- 
teins and tfie number of AP2 domains per species as determined by sen- 
sitive sequence profile analysis using either the AP2 HMM available from 
PFAM or our custom ApiAP2 HMM. Analyses on most species were run 
on fully annotated protein sets. **Dinofiagellate analyses were run on clus- 
tered EST data. *F. marinm analyses were run on clustered genome ORFs. 
These counts represent profile matches at or below a permissive e-value of 
10. Approximately 85% of the hits were at or below le-3. 



plementary File SI, Tables S4 and S5). Phylogenies con- 
structed from the identified domain sequences indicate that 
perkinsid Perkinsus marinus AP2 domains are closely re- 
lated to apicomplexan AP2 domains, and both are more dis- 
tantly related to other chromalveolate/endosymbiont AP2 



domains (Figure 2). Deep evolutionary relationships are 
difficult, if not impossible to recover due to the short length 
(~60 amino acids) of the domain and lack of discernable 
homology over the rest of the protein (as determined by 
multiple sequence alignment; data not shown); thus, most 
deep relationships within the tree are not well-supported. 
However, the bootstrap support for the divide between 
apicomplexan/perkinsid AP2 domains and the rest of the 
chromalveolates and algae is highly significant (Figure 2). 

Perkinsid and apicomplexan AP2 domains can be classified 
into evolutionary clades 

There are two kinds of intraphylum domains; restricted 
lineage-specific domains and as many as 19 domains shared 
between all or most apicomplexans (Supplementary File S6, 
Figure SI; Table 1). Domain counts and composition of ho- 
molog groups vary depending on the stringency of e-value 
parameters used to assign orthologs to clusters; thus, we in- 
dicate ranges of domains determined by OrthoMCL clus- 
tering at le-4 to le-11 in Table 1 (Supplementary File SI, 
Tables S6-S9). We determined that le-6 is the most strin- 
gent e-value at which interphylum AP2 domains between 
apicomplexans and perkinsids can be detected, and we used 
<le-6 for subsequent analyses. We used phyletic distribu- 
tion data for all predicted perkinsid and apicomplexan AP2 
domains to further subdivide them into evolutionary clades 
and classify the C. parvum domains. Twenty-three AP2 do- 
mains were detected in 18 different C. parvum proteins using 
these cutoffs. 

The 23 C. parvum AP2 domains were further classified 
as ancestral, pan-apicomplexan or lineage-specific based on 
their phyletic distribution with OrthoMCL clustering using 
an e-value cutoff of le-6 (Figure 3). Apicomplexan AP2 do- 
mains that clustered with a P. marinus AP2 domain were 
classified as ancestral; these domains likely predate the di- 
vergence between perkinsids and the Apicomplexa. Four 
C. parvum AP2 domains (cgd4_1110JDl, cgd4_1110_D3, 
cgd8_3130 and cgd8_3230) are ancestral (Figure 3). Do- 
mains are indicated by gene ID, and in the case of mul- 
tidomain proteins, numbered D1-D4 starting from the N- 
terminus. Domains that span all or most apicomplexan lin- 
eages, but were absent in Perkinsus were classified as pan- 
apicomplexan (10 C. parvum domains are in this category). 
The remaining nine domains have no orthologs outside of 
Cryptosporidium and are classified as lineage-specific. It is 
necessarily true that some pan-apicomplexan domains may 
have been present in the perkinsid/apicomplexan ancestor 
as well, and were subsequently lost in Perkinsus. Because 
there is no extant evidence of these domains in Perkin- 
sus and because there is ambiguity with respect to when 
these domains arose, we maintain separate 'ancestral' and 
'pan-apicomplexan' designations. Lineage-specific domains 
have no identifiable orthologs outside their respective taxa, 
though again it is a formal possibility that these could also 
be true 'ancestral' domains that were lost in other lineages. 

parvum ApiAP2 domains bind diverse sequences 

De Silva et al. (41) determined the DNA binding speci- 
ficity of the C. parvum AP2 domain cgd2_3490, and we pre- 
viously reported the DNA-binding specificity of cgd8_810 
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Figure 2. Evolutionary relationships of AP2 domains across chromalveolates and algae. Maximum likelihood tree constructed of top-scoring AP2 domains 
(hmmsearch domain e-value of le-3 or better using the AP2 HMM) from selected taxa. Bootstrap support obtained from 100 replicates are indicated on 
nodes where support = 50% or greater. Tree constructed with RAxML using a gamma rate estimation and a Dayhoff model of codon evolution, and 
visualized using FigTree (v. 1.4.0: http://tree.bio.ed.ac.uk/software/figtree/). Species abbreviation prefixes have been added to non-apicomplexan gene IDs 
for ease of understanding: At_, A. tamarense; Cr_, C. reinhardtii; K_brevis_, K. hrevLr, Micro., Micromonas; Pm_, P. marimis; Ptric_, P. tricornutum; Tt_, T. 
thermophila; Tp_, T. pseiidonana. Perkinsid and ApiAP2 domains group together outside of other chromalveolate and green algae AP2 domains with high 
bootstrap support, indicating possible independent origins for these domains. 
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Table 1. AP2 domain counts by evolutionary group 
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Table 1. AP2 domain evolutionary classes across apicomplexans and P. marinus as determined by OrthoMCL clustering at e- values ranging from le-4 to 
le-11. There is no detectable orthology to P. marinus domains above le-6; the 'ancestral' and ^P. marimLs-speci&c' categories were not determined above 
this cutoff All identified apicomplexan and perkinsid AP2 domains were subjected to clustering. Refer to Supplementary File SI, Tables S6-S9 for IDs of 
domains falling into each classification at each e-value. 
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Figure 3. C. parvum and P. falciparum AP2 domain ortholog binding mo- 
tifs as determined by PBM. C parvum domains are color-coded according 
to evolutionary groups based on OrthoMCL clustering at le-6 as discussed 
in Materials and Methods. Data for core DNA motifs determined for P. 
falciparum AP2 domains obtained from (37). 



(41,49). To determine binding specificities for the remain- 
ing 21 predicted C. parvum AP2 domains, we created protein 
expression constructs and assayed binding using PBMs and 
cgd2_3490 as a control. Our results using PBMs agree with 
the previously reported 5'-TGCAT-3' core binding motif for 
cgd2_3490, and we detect new binding specificities for 15 of 
the remaining predicted C. parvum AP2 domains (Figure 3). 

We find that C. parvum AP2 domains bind a diversity 
of sequences similar to what is seen in P. falciparum. Al- 
though the C. parvum AP2 family can recognize a variety 
of sequences, we find that of the 1 5 domains for which we 
detected binding motifs, 10 of these bind one of three mo- 
tif types: the 5'-TGCAT-3' motif (recognized by four do- 
mains from four different proteins), the 5'-CACACA-3' mo- 
tif (recognized by four domains from four different pro- 
teins) or the G-box motif (5'-G[T/C]GGGG-3', recognized 
by three domains from three different proteins). Cgd8_810 
and cgd2_2990 bind the G-box as their primary motif, 
while cgdl_3520 binds the G-box motif secondarily (and is 
counted in each category; see below). Motifs and their PBM 
enrichment scores can be found in Supplementary File S6, 
Figures S2 and S3. P. falciparum also has four CACACA- 
binding AP2 domains, but this is the only markedly redun- 
dant P. falciparum AP2 binding motif (37). 

Secondary and tertiary motif recognition. Secondary DNA 
binding motifs are thought to impart DNA-binding pro- 
teins with a broader range of binding interactions and 
thereby expand the repertoire of genes regulated by tran- 
scription factors (55). Multiple binding specificities above 
threshold were previously reported for several P. falciparum 
AP2 domains (37). Many of these secondary or tertiary 
binding sites had little similarity, indicating an additional 
layer of complexity to ApiAP2 regulation. C. parvum AP2s 
also display multiple motif recognition, though in the ma- 
jority of cases secondary motifs are highly similar to, or are 
reverse complements of, the primary motif (Supplementary 
File S6, Figure S3). We find that only one C. parvum do- 
main, cgdl_3520, is able to recognize two completely differ- 
ent motifs, both the 5'-TGCAT-3' motif and the G-box. 
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Binding motif conservation between putative P. falciparum 
and C. parvum orthologs 

It was noted previously that orthologous AP2 domains 
across P. falciparum, P. berghei and C. parvum (gene 
ids PF14_0633, PBANKA_132980 and cgd2J490, respec- 
tively) have nearly identical binding specificities for the 5'- 
TGCATGCA-3' motif (17,41). Our phylogenetic analyses 
support the orthology of this domain group, and we find 
an additional putative C. parvum ortholog to PF14_0633 
(cgdl_3520) that also recognizes this motif (Figure 3). The 
putative orthologs cgd8_3130 and PF14_0533 bind highly 
similar motifs, as does putative ortholog pair cgd8_3230 and 
PFE0840c_D2. 

A short, conserved linker region between AP2 domains is 
found in five P. falciparum ApiAP2 proteins (37). C. parvum 
proteins with multiple domains do not appear to contain 
this linker. The C. parvum multidomain protein cgd6_5320 
has four predicted AP2 domains, and cgd4_1110 has three; 
Figure 3. Whether the C. parvum proteins utilize multi- 
ple DNA-binding regions simultaneously remains to be de- 
termined. Interestingly, C. parvum AP2 domain cgd4_3820 
recognizes the sequence 5'-GGTGCACC-3', while its pu- 
tative P falciparum ortholog PFF0200c_D2 (38% identity, 
with no conservation of residues predicted to be impor- 
tant for base-specific contacts (56)) failed to bind DNA 
as measured by PBMs. However, a construct of both AP2 
domains PFF0200c_Dl (which does show binding) and 
PFF0200c_D2 joined by a short conserved linker region 
does bind the same motif as cgd4_3820. The Dl domain of 
PFF0200c shares only a single base-specific contact residue 
with cgd4_3820. These findings suggest that the binding in- 
teractions and specificities are complex. 

Binding specificity is not conserved between the puta- 
tive orthologs cgd4_1110_D3 and PFE0840c_D2, or pu- 
tative orthologs cgd5_4250 and PF14_0079. There is no 
binding specificity above threshold in C. parvum for two 
domains (cgd6_5320_D3 and cgd6_5320_D4) whose puta- 
tive orthologs (PF11_0404 and PFL1900w, respectively) do 
have binding motifs. These ill-conserved binding specifici- 
ties may indicate that these domains are not true orthologs. 
Alternatively, the lack of conservation may be a true snap- 
shot of evolving binding specificities, especially given the 
significant support of conserved binding specificities for the 
other putative ortholog groups. 

Though putative orthologous apicomplexan AP2 do- 
mains often have similar binding specificities, evolution- 
ary distance does not always predict binding specificity. 
We constructed a maximum likelihood tree of all predicted 
P. falciparum and C. parvum ApiAP2 domains and super- 
imposed their binding motifs to examine the relationship 
between evolutionary distance and binding motif (Figure 
4). AP2 domains that recognize similar motifs are inter- 
spersed throughout the tree. Putative orthologs PF14_0633, 
cgd2_3490, cgdl_3520 and cgd8_3230 all bind 5'-TGCAT- 
3'-like motifs, and are clustered together on the tree, though 
we also find TGCAT-binding AP2 domains that are more 
distantly related to this group. The G-box and CACACA- 
binding AP2 domains are more distantly related. These 
phyletic distributions could be explained by duplication of 
domains and divergence of their binding sites both within 
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Figure 4. Maximum likelihood tree of P. falciparum and C parvum AP2 
domains and their DNA-binding motifs. Domain sequences were extracted 
from full-length proteins using HMM-defined coordinates, aligned and 
edited as described in Materials and Methods. A maximum likelihood tree 
was constructed from the edited alignment using RAxML with a Dayhoff 
protein evolution model, then visualized using FigTree. Domains in red 
were not identified in (37); as all domains were numbered from N-terminus 
to C-terminus, the numbering scheme therefore shifts slightly from 
(37). The previous domains PFl 1_0404_D3, PF13_025_D3, PF13_0267 
and PF13_0026 correspond in this figure to domains PF11_0404_D4, 
PF13_0235_D4, PFl 3_0267_D2 and PFl 3_0026_D2, respectively *Denotes 
domains that when tested alone, have no detectable binding specificity, but 
when tested with an adjacent domain, have the indicated binding speci- 
ficity. **Denotes tested domains with no binding motifs over threshold. 



each species and as a consequence of speciation events. De- 
termination of the families of AP2 binding motifs for in- 
termediate taxa, such as T. gondii or the Piroplasmida, may 
further elucidate the relationship between AP2 binding sites 
and evolutionary descent. 



Multiple ApiAP2 domains can bind C. parvum overrepre- 
sented upstream motifs 

We previously reported 1 1 families of overrepresented mo- 
tifs located upstream of C. parvum genes (49). Two of these 
motif families are known transcription factor motifs, E2F- 
like and CAAT-box-like. Another of these motif families, 
5'-TGCAT-3', which has a palindromic core, is an ApiAP2 
binding site (designated AP2_r in (49)). We found three 
additional C. parvum ApiAP2 proteins that bind this mo- 
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Figure 5. Overrepresented C parvum motifs bound by AP2 domains. 
Overrepresented motifs as determined by (49). Overrepresented motifs tfiat 
are (i) recognized by known non-AP2 proteins (E2F; CAAT-box) or (ii) are 
not AP2 binding motifs (GAGA-like; Unknown Set 1 , Unknown Set2 and 
Unknown Motifs 14, 21, 22 and 25) are not included. 



tif (Figure 5). While not constitutive, at least one cluster 
containing the 5'-TGCAT-3' motif in their upstream re- 
gions is highly expressed at each of the surveyed life cycle 
time points (49). The same is true for transcripts represent- 
ing each of the four TGCAT-binding ApiAP2 proteins; at 
least one transcript is maximally expressed at each of the 
surveyed time points (see cgd8_3230, cgdl_3520, cgd5_4250 
and cgd2_3490 in Supplementary File S6, Figure S5). We 
additionally reported that ApiAP2 cgd8_8 10 binds the over- 
represented G-box motif (49), and we find that cgd2_2990 
and cgdl_3520 also recognize the G-box. Cgd2_2990 has a 
bimodal expression pattern, peaking at 6 and (to a lesser 
degree) 24 h post-infection, cgdl_3520 has peak expression 
at 12 h post-infection, while cgd8_810 is expressed at mul- 
tiple later time points. Clusters containing overrepresented 
G-box motifs in the upstream regions of their genes are 
also maximally expressed, individually, at any of the sur- 
veyed time points across the life cycle. These results suggest 
that regulation of these differentially expressed gene clus- 
ters might be handled by the respective coexpressed ApiAP2 
(Figure 6). 

We did not detect ApiAP2 protein-DNA interactions for 
nine additional previously predicted overrepresented up- 
stream motifs (49). Interestingly, we identified four different 
AP2 domains that can bind the pahndromic 5'-CACACA-3' 
motif, yet the motif is not overrepresented upstream of the 
200 coregulated C. parvum gene clusters we previously iden- 
tified. We were able to predict putative regulatory targets for 
two of these CACACA-binding AP2 domains, cgd8_3130 
and cgd4_600. The other CACACA-binding AP2 domains, 
cgd5_2570 and cgd6_2600, have no predicted targets below 
statistical threshold. Most of these putative targets have a 
bimodal expression pattern, peaking at 12 and 36 h post- 
infection (data not shown). ApiAP2 proteins cgd8_3 1 30 and 
cgd4_600 are expressed during these time points, and thus 
could plausibly be involved in regulation of these genes. 
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Figure 6. Expression patterns of genes containing overrepresented C. 
parvum G-box motifs and potential G-box-binding ApiAP2 regulators. (A) 
The PBM-determined binding motif as well as expression data for each 
G-box-binding AP2 domain is displayed for seven time points from 2 to 
72 h across the in vitro life cycle (expression data from (52)). (B) Two G- 
box motifs were identified as overrepresented in upstream regions of C. 
parvum genes clustered by expression profile (49). The number of clusters, 
and the number of genes having the motif out of those clusters is shown. 
(C) Expression data for all genes in clusters identified as having overrep- 
resented G-box motifs. Genes with overrepresented G-box motifs are ex- 
pressed across the life cycle; there is a G-box-binding ApiAP2 protein ex- 
pressed at each of those time points, suggesting ApiAP2s could be driving 
expression of these genes. 



ApiAP2 network evolution: comparisons of orthologous and 
lineage-specific ApiAP2s and their regulatory targets 

Behnke et al. (38) found that genes expressed throughout 
the T. gondii cell cycle define subtranscriptomes expressed 
in two separate waves; genes responsible for basal processes, 
such as DNA replication, protein translation and glycolysis; 
and genes specific to apicomplexan processes, such as those 
involved in invasion or immune evasion (38). They noted 
that 24 ApiAP2 proteins are expressed in a cascade across 
the cell cycle. These findings raise the intriguing possibility 
that the evolutionary history of AP2 domains is somehow 
correlated with the evolutionary history of their regulatory 
targets — i.e. that ancestral or pan-apicomplexan AP2 do- 
mains might be responsible for regulating basal housekeep- 
ing processes, while lineage-specific AP2 domains might 
regulate apicomplexan-specific processes. To further inves- 
tigate this possibility, we used a modified version of the al- 
gorithm Campbell et al. (37) developed to predict regula- 
tory targets (which incorporates genome -wide expression 
data and presence of AP2 binding motifs in upstream re- 
gions) for a number of C. parvum AP2 domains (37). We se- 
lected lineage-specific and orthologous AP2 domains from 
both C. parvum and P. falciparum and evaluated the cate- 
gory composition of their predicted target genes (see Ma- 
terials and Methods). However, we did not find a signifi- 
cant correlation between evolutionary class of AP2 domain 
and putative targets in either organism (Supplementary File 
S6, Figure S4). Comparison of putative targets between an- 
cestral or pan-apicomplexan C. parvum AP2 domains and 
their P. falciparum orthologs revealed very little overlap be- 
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tween them (Supplementary File SI, Table SIO). P. falci- 
parum ApiAP2 proteins on average are predicted to regulate 
a much higher percentage of genes. 



Conservation between C. pavvum and C. hominis ApiAP2s 
and their putative targets 

We find that although the C. parvum and C. hominis genome 
sequences are 97% identical (47), the ApiAP2s themselves 
do not display this average similarity. 18/35 C. parvum do- 
mains are 100% identical to domains in C. hominis; another 
eight domains have 80-99% aa identity; three domains do 
not appear to have orthologs in C. hominis and the remain- 
ing six have similarities as low as 43%. While some of these 
differences can probably be explained by assembly status of 
the genomes, the results are intriguing. 

Few C. hominis expression data exist and no protein 
binding data exist; thus, regulatory targets for C. hominis 
ApiAP2 proteins could not be predicted. Instead, we com- 
pared the upstream regions of the putative targets of two 
C. parvum ApiAP2's that share 100% identity with C. ho- 
minis orthologs cgd8_3230 and cdg5_4250. Cgd8_3230 has 
1 5 predicted targets with orthologs in C. hominis Of these 
15 orthologous pairs of upstream regions, seven are >90% 
identical across 90% or more of their length; five are >90% 
identical across 50% or more of their length and the remain- 
ing three are >90% identical across <50% of their length. 
Cgd5_4250 has 23 predicted targets with orthologs in C. ho- 
minis. Of these 23 orthologous pairs of upstream regions, 
nine are >90% identical across 90% or more of their length; 
one is > 90% identical across >50% of their length and the 
remaining 13 are >90% identical over <50% of their length. 



The ApiAP2 expression cascade is conserved in C. parvum 

Unlike Plasmodium spp. and To.xoplasma, a number of 
other possible sequence-specific transcription factor fam- 
ilies have been detected in the C. parvum genome (sum- 
marized in (49)), some of which are absent in other api- 
complexans (E2F, for example). The ratio of available pu- 
tative C. parvum transcription factors to regulate target 
genes (~ 1:340) is much higher than the P. falciparum ra- 
tio (~1:800), due both to the lower gene count in C. parvum 
and a higher absolute number of possible transcription fac- 
tors (19). We have also determined that the E2F binding 
motif is one of the most overrepresented motifs in the up- 
stream regions of C. parvum genes (49). Given these ob- 
servations, it might be expected that C. parvum is less re- 
liant on the ApiAP2 family for transcriptional regulation 
than P. falciparum and other apicomplexans (see Supple- 
mentary File S7 for C. parvum gene cluster upstream mo- 
tif co-occurrence (49) and clusters with E2F motifs, but no 
known AP2 binding motif). However, expression data for 
each predicted C. parvum ApiAP2 protein indicate that the 
expression cascade observed across the P. falciparum blood 
stage (37) and across the T. gondii cell cycle (38) is conserved 
in C. parvum (Supplementary File S6, Figure S5), though 
putative orthologous ApiAP2s do not necessarily appear at 
similar temporal/developmental windows in the cascades 
(to the extent they can be correlated). 



DISCUSSION 

We identified and characterized the family of C. parvum 
AP2 domains by experimentally determining their DNA 
sequence targets. We then used this information to ex- 
amine ApiAP2 regulation in a kingdom-wide context by 
performing evolutionary analyses of the distribution of, 
and relationships between, AP2 domains, many of which 
also have experimental binding-specificity data. Phyloge- 
nies constructed from AP2 domains spanning chromalveo- 
lates and extant representatives of their endosymbionts in- 
dicate a distinct divide between AP2s found in the plants, 
stramenopiles, deep alveolates, such as ciliates and dinoflag- 
ellates, and those found in the Apicomplexa. The perkinsid 
AP2 domains group confidently with the apicomplexans to 
the exclusion of other chromalveolates. Some domains are 
orthologous, spanning several apicomplexan taxa and thus, 
must predate speciation events. 

Our results suggest that by whichever manner AP2 do- 
mains came to reside in the apicomplexan/perkinsid an- 
cestor (mobile element invasion, transfer from an algal 
endosymbiont or some mixture of these events), perkin- 
sid and apicomplexan AP2 domains share a common ori- 
gin. Based on our homology analyses, we propose that 
there were five to six progenitor domains arising from the 
acquisition event which occurred sometime between the 
split from dinoflagellates and the appearance of the lat- 
est perkinsid /apicomplexan ancestor. The domains in the 
perkinsid and apicomplexan lineages then amplified inde- 
pendently. The apicomplexan ancestor likely possessed 10- 
18 domains (or a maximum of 10-18 ApiAP2 proteins, 
an estimate in approximate agreement with the maximum 
of nine proteins proposed in (8)). A more precise estimate 
will require additional, diverse sampling across the phy- 
lum as well as additional structural analysis of the domain. 
Though some domains present in both apicomplexans and 
perkinsids are ancestral, domains spanning other combina- 
tions of taxa may be either ancestral or have been lost in 
a few of the extant lineages, or they may have arisen as a 
result of recent amplification. The most striking amplifica- 
tions have occurred in the coccidian and Plasmodium lin- 
eages, with anywhere from 42 to 69 of the ~90 coccidian 
domains and 14 to 29 of the ~50 Plasmodium domains be- 
ing lineage-specific. Due to the thresholds used, it is a pos- 
sibility that we have not detected all AP2 domains, or that 
we have designated some weakly homologous domains as 
AP2s when they are not functional DNA-binding domains. 
Interestingly, several high-scoring predicted domains had 
no detected DNA binding motifs (such as cgd6_5320_Dl 
through D4, cgd4_1110JD2 and cgd6_1140. Figure 3; Sup- 
plementary File SI, Table S4). It is also possible that we 
have not detected all binding motifs for C. parvum AP2s, 
whether through failure to capture critical binding residues 
in our constructs or other experimental shortcoming. How- 
ever, other AP2 domains from Plasmodium also have no 
detected DNA binding (37), one of which is orthologous 
to cgd6_5320JDl. These results continue to suggest that 
some predicted apicomplexan AP2 domains may function 
outside the context of binding DNA. Alternatively, it has 
been suggested that AP2 domains may be discriminators 
of methylated DNA (57); it is possible that these domains 
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are DNA-binding, but our assay did not reflect subtle DNA 
modifications necessary for binding to occur. 

The current lack of continuous in vitro propagation and 
molecular genetic tools in Cryptosporidium imposes a crit- 
ical barrier to further functional characterization of pre- 
dicted ApiAP2 transcription factors and their putative reg- 
ulatory targets. Our target predictions are based on gene 
expression data from the limited in vitro life cycle. Addi- 
tional clusters and upstream patterns will undoubtedly ap- 
pear when in vivo data are available. It is also important 
to note that although proteomics data from the very early 
stages of the C. parvum life cycle are available (58,59), there 
are no proteomics data for the majority of the life cycle. 
Thus, we do not know how closely mRNA expression in- 
dicates protein expression in C. parvum, and the expecta- 
tion that ApiAP2 mRNA expression profiles should cor- 
relate highly with those of predicted target genes may be 
flawed. The correlation between mRNA and protein expres- 
sion in P. falciparum was found to be moderately positive, 
though a delay has been observed for several genes, indi- 
cating post-transcriptional regulatory mechanisms in Plas- 
modium (10,60). Unlike what has previously been indicated 
for P. falciparum ApiAP2s (37), C. parvum ApiAP2 mRNA 
expression does not correlate well with predicted target gene 
expression profiles in many cases (data not shown). The up- 
stream sequence database used to mine for putative tran- 
scription factor binding sites is greatly affected by the sta- 
tus of the annotation, and untranslated regions are largely 
undefined in C. parvum. Though it has been suggested that 
UTRs may overlap with coding regions in highly compact 
genomes, such as that of C. parvum (61), the prevalence of 
this phenomenon has not been established, and we did not 
search any coding regions in the construction of our up- 
stream sequence database. It should also be noted that we 
have far fewer time points over which expression data were 
measured (seven time points post-infection spread over 72 
h versus 48 hourly time points for P. falciparum), and we 
do not have the resolution nor as much statistical power in 
target prediction as has been achieved in P. falciparum (37). 

Our ApiAP2 network analysis, based on a combination 
of in vitro (PBMs) and computational data, lays the foun- 
dation for further exploration of transcriptional regulation 
in the absence of molecular genetic tools. Even when con- 
sidering model organisms for which there are a myriad of 
genetic tools, few large transcription factor family networks 
have been characterized in depth (37,55,62-66). Here, we 
have presented evidence that ApiAP2s are likely major play- 
ers in C. parvum transcriptional regulation, namely: (1) An 
ApiAP2 regulatory cascade is conserved in C. parvum, and 
(2) C. parvum ApiAP2s bind a diverse set of motifs, many 
of which are conserved with P. falciparum and overrepre- 
sented upstream of many co-expressed gene clusters. The 
conservation of a putative ApiAP2 regulatory cascade de- 
spite complete reordering of orthologous ApiAP2 expres- 
sion between C. parvum and P. falciparum further suggests 
extensive ApiAP2 network rewiring over its evolutionary 
history. Intriguingly, even with the much smaller time scale 
between C. parvum and C. hominis (genome sequences 97% 
identical), we find that orthologous ApiAP2 proteins them- 
selves, as well as orthologous upstream regions of predicted 
target genes are not absolutely conserved. Some of these dif- 



ferences may be due to differences in assembly status, but 
others likely indicate divergences that may play a role in 
the differing host range and pathogenicity between the two 
species. Questions regarding divergence between C. parvum 
and C. hominis can be more fully addressed as more C. ho- 
minis genome sequence and expression data become avail- 
able. In conjunction with our phylogenetic analyses, these 
results contribute to the beginnings of a framework for 
understanding ApiAP2 regulation in other apicomplexans. 
Binding motifs have been identified for several members of 
a single AP2 domain ortholog group (PF14_0633 in P. fal- 
ciparum, cgd2_3490 in C. parvum, TGME49_110950 in T. 
gondii and AP2-Sp in P. berghei; (16,38,41)), all of which 
bind the 5'-TGCAT-3' motif Our results build on these pre- 
vious observations. Putatively orthologous domains have 
conserved binding specificities between two of the most dis- 
tantly related apicomplexans, P. falciparum and C. parvum, 
indicating that binding specificities can be inferred by or- 
thology. 

We have noted that orthologous AP2 domains often have 
conserved DNA-binding motifs, yet the putative networks 
of target genes are vastly different between orthologous 
AP2s, with very few shared targets. Our broadscale com- 
parisons of ApiAP2 network composition between P. falci- 
parum and C. parvum suggest that there is no relationship 
between evolutionary class of AP2 domain and evolution- 
ary class of predicted targets. If network divergence does 
not appear to be driven by evolution of the ApiAP2 pro- 
tein binding specificities, divergence could instead be driven 
by genome rearrangements, through shuffling, ablation and 
creation of cognate cis elements upstream of completely dif- 
ferent sets of genes. Apicomplexa have undergone a strik- 
ing degree of genome rearrangement, with no three genes 
found together, in the same order, across the phylum; even 
in closely related lineages, such as Plasmodium and the Piro- 
plasmida (~300 my divergence time, (67)) synteny is rare 
(40). Regulatory network evolution by way of transcription 
factor binding site turnover has been documented in several 
cases in yeast as well as animals (reviewed in (68)). 

Transcription factor substitution may also play a role in 
ApiAP2 network divergence. We previously reported evi- 
dence of substitution in the ribosomal protein regulon be- 
tween a P. falciparum G-box-binding AP2 and C. parvum 
E2F (49), and further evidence suggests multiple transcrip- 
tion factor handoffs, as yet another AP2 binding site, 5'- 
TGCAT-3', is conserved upstream of T. gondii and N. can- 
inum ribosomal genes (69). Ribosomal gene regulon tran- 
scription factor substitution has also been noted in yeast 
(70,71). Campbell et al. (37) reported extensive divergence 
between predicted orthologous ApiAP2 regulons in P. fal- 
ciparum, P. vivax and P. yoelli, indicating that there is ex- 
tensive network divergence even on relatively small evolu- 
tionary time scales (~120 million years). Conservation of 
transcription factor binding in the face of extensive regulon 
divergence has been noted across several organisms (72-75). 

The function of ApiAP2 proteins outside of the AP2 
domain(s) itself is unclear, though the rest of the protein 
presumably has some involvement in facilitating protein- 
protein interactions. Yeast-two-hybrid studies have indi- 
cated that some P. falciparum ApiAP2 proteins interact with 
each other, as well as with other regulatory proteins, such 
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as the histone acetyltransferase Gcn5 (76). Structural stud- 
ies oia.P. falcparum ApiAP2 (PF14_0633) demonstrate that 
AP2 domains can dimerize to bind DNA (56). We previ- 
ously reported that clustered C. parvum gene expression pat- 
terns cannot be attributed to the presence of any one type 
of upstream motif (49). Further protein-interaction stud- 
ies on ApiAP2 proteins are needed to establish the degree 
to which ApiAP2 trans regulatory environments are con- 
served. Rewiring of transcriptional regulatory networks via 
evolving combinatorial interactions has also been reported 
in yeast (reviewed in (68)). 

Many C. parvum AP2 domains bind redundant motifs, 
and the majority of C. parvum AP2 domains bind only one 
motif Thus, C. parvum ApiAP2 regulation does not appear 
to be as multifaceted as is suggested in P. falciparum (37). 
The presence of additional non-ApiAP2 transcription fac- 
tors in the C. parvum genome may explain the decreased 
diversity of ApiAP2 binding motifs. We noted previously 
that the E2F motif is the most abundantly overrepresented 
motif in the upstream regions of the C. parvum genome, be- 
ing found upstream of 161 of 200 predicted co-regulated 
gene clusters (49). E2Fs are notably absent in Plasmodium 
and other apicomplexans (19), and they are also among the 
most ancient transcription factor families that can be traced 
back to the last eukaryotic common ancestor (as well as 
Myb, C2H2 zinc finger, bZIP and AT-hook domains, most 
of which are present across the Apicomplexa) (18). It is pos- 
sible that the three predicted E2F transcription factors and 
their two DPI dimerization partners are responsible for a 
disproportionate amount of the transcriptional regulation, 
such that C. parvum is less reliant on ApiAP2s. The appar- 
ent redundancy in C. parvum ApiAP2 binding motifs may 
also be important for stage-specific transcriptional regula- 
tion, as ApiAP2s binding the same or similar motifs are 
expressed at various points across the life cycle. While we 
identified several AP2 domains that can potentially bind 
two predicted C. parvum regulatory motif families (49), the 
function of seven of the remaining overrepresented motif 
families is still unknown. Several players in C. parvum tran- 
scriptional regulation have yet to be identified. The mecha- 
nisms by which AP2 domain-containing proteins came to 
regulate the vast majority of genes in many apicomplex- 
ans beginning from just a few, or perhaps a single, verti- 
cally inherited factor are likely varied, involving a combina- 
tion of modalities. C. parvum, with its more diverse comple- 
ment of transcription factor families and possible reduced 
reliance on ApiAP2 proteins, offers clues to the ancestral 
state of apicomplexan transcriptional regulation, pre AP2- 
domination. 
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